Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells447436
Missing cells (%)8.4%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 95 (21.3%) missing values Age has 86 (19.3%) missing values Missing
Cabin has 350 (78.5%) missing values Cabin has 349 (78.3%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 306 (68.6%) zeros SibSp has 307 (68.8%) zeros Zeros
Parch has 340 (76.2%) zeros Parch has 341 (76.5%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 7 (1.6%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-05-07 00:00:52.1713762024-05-07 00:00:56.179014
Analysis finished2024-05-07 00:00:56.1778612024-05-07 00:00:59.194845
Duration4.01 seconds3.02 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean447.45516452.88789
 Dataset ADataset B
Minimum12
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:00:59.606723image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum12
5-th percentile41.2553.25
Q1230.5233.5
median450.5470
Q3670.5664.75
95-th percentile843.75851.5
Maximum891891
Range890889
Interquartile range (IQR)440431.25

Descriptive statistics

 Dataset ADataset B
Standard deviation258.2702254.17386
Coefficient of variation (CV)0.577197960.56122909
Kurtosis-1.1844795-1.1466849
Mean447.45516452.88789
Median Absolute Deviation (MAD)220.5215.5
Skewness-0.019543894-0.032871666
Sum199565201988
Variance66703.49864604.351
MonotonicityNot monotonicNot monotonic
2024-05-07T00:00:59.876445image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
191 1
 
0.2%
731 1
 
0.2%
188 1
 
0.2%
299 1
 
0.2%
606 1
 
0.2%
248 1
 
0.2%
796 1
 
0.2%
86 1
 
0.2%
420 1
 
0.2%
548 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
694 1
 
0.2%
261 1
 
0.2%
798 1
 
0.2%
444 1
 
0.2%
69 1
 
0.2%
883 1
 
0.2%
374 1
 
0.2%
24 1
 
0.2%
155 1
 
0.2%
591 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
5 1
0.2%
6 1
0.2%
11 1
0.2%
15 1
0.2%
16 1
0.2%
21 1
0.2%
23 1
0.2%
24 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
5 1
0.2%
6 1
0.2%
11 1
0.2%
15 1
0.2%
16 1
0.2%
21 1
0.2%
23 1
0.2%
24 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
282 
1
164 
0
274 
1
172 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row11
3rd row10
4th row11
5th row00

Common Values

ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Length

2024-05-07T00:01:00.077927image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T00:01:00.337943image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:01:00.473993image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring characters

ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
245 
1
106 
2
95 
3
245 
1
107 
2
94 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row23
2nd row21
3rd row11
4th row13
5th row33

Common Values

ValueCountFrequency (%)
3 245
54.9%
1 106
23.8%
2 95
 
21.3%
ValueCountFrequency (%)
3 245
54.9%
1 107
24.0%
2 94
 
21.1%

Length

2024-05-07T00:01:00.621833image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T00:01:00.768183image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:01:00.918551image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
3 245
54.9%
1 106
23.8%
2 95
 
21.3%
ValueCountFrequency (%)
3 245
54.9%
1 107
24.0%
2 94
 
21.1%

Most occurring characters

ValueCountFrequency (%)
3 245
54.9%
1 106
23.8%
2 95
 
21.3%
ValueCountFrequency (%)
3 245
54.9%
1 107
24.0%
2 94
 
21.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 245
54.9%
1 106
23.8%
2 95
 
21.3%
ValueCountFrequency (%)
3 245
54.9%
1 107
24.0%
2 94
 
21.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 245
54.9%
1 106
23.8%
2 95
 
21.3%
ValueCountFrequency (%)
3 245
54.9%
1 107
24.0%
2 94
 
21.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 245
54.9%
1 106
23.8%
2 95
 
21.3%
ValueCountFrequency (%)
3 245
54.9%
1 107
24.0%
2 94
 
21.1%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:01.389271image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6767
Median length5249
Mean length26.88340826.70852
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1199011912
Distinct characters5859
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowPinsky, Mrs. (Rosa)Saad, Mr. Khalil
2nd rowSinkkonen, Miss. AnnaCarter, Mrs. William Ernest (Lucile Polk)
3rd rowNewsom, Miss. Helen MonypenyAllison, Miss. Helen Loraine
4th rowShutes, Miss. Elizabeth WBarah, Mr. Hanna Assi
5th rowCor, Mr. LiudevitKaraic, Mr. Milan
ValueCountFrequency (%)
mr 261
 
14.4%
miss 89
 
4.9%
mrs 66
 
3.6%
john 27
 
1.5%
william 24
 
1.3%
master 20
 
1.1%
henry 16
 
0.9%
james 14
 
0.8%
charles 14
 
0.8%
anna 13
 
0.7%
Other values (888) 1269
70.0%
ValueCountFrequency (%)
mr 260
 
14.4%
miss 93
 
5.2%
mrs 65
 
3.6%
william 25
 
1.4%
henry 21
 
1.2%
john 19
 
1.1%
master 15
 
0.8%
james 12
 
0.7%
mary 12
 
0.7%
frederick 11
 
0.6%
Other values (902) 1267
70.4%
2024-05-07T00:01:02.161025image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1369
 
11.4%
r 972
 
8.1%
e 840
 
7.0%
a 835
 
7.0%
n 689
 
5.7%
i 642
 
5.4%
s 635
 
5.3%
M 570
 
4.8%
l 522
 
4.4%
o 510
 
4.3%
Other values (48) 4406
36.7%
ValueCountFrequency (%)
1356
 
11.4%
r 982
 
8.2%
a 831
 
7.0%
e 829
 
7.0%
i 679
 
5.7%
n 670
 
5.6%
s 631
 
5.3%
M 556
 
4.7%
l 509
 
4.3%
o 475
 
4.0%
Other values (49) 4394
36.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11990
100.0%
ValueCountFrequency (%)
(unknown) 11912
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1369
 
11.4%
r 972
 
8.1%
e 840
 
7.0%
a 835
 
7.0%
n 689
 
5.7%
i 642
 
5.4%
s 635
 
5.3%
M 570
 
4.8%
l 522
 
4.4%
o 510
 
4.3%
Other values (48) 4406
36.7%
ValueCountFrequency (%)
1356
 
11.4%
r 982
 
8.2%
a 831
 
7.0%
e 829
 
7.0%
i 679
 
5.7%
n 670
 
5.6%
s 631
 
5.3%
M 556
 
4.7%
l 509
 
4.3%
o 475
 
4.0%
Other values (49) 4394
36.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11990
100.0%
ValueCountFrequency (%)
(unknown) 11912
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1369
 
11.4%
r 972
 
8.1%
e 840
 
7.0%
a 835
 
7.0%
n 689
 
5.7%
i 642
 
5.4%
s 635
 
5.3%
M 570
 
4.8%
l 522
 
4.4%
o 510
 
4.3%
Other values (48) 4406
36.7%
ValueCountFrequency (%)
1356
 
11.4%
r 982
 
8.2%
a 831
 
7.0%
e 829
 
7.0%
i 679
 
5.7%
n 670
 
5.6%
s 631
 
5.3%
M 556
 
4.7%
l 509
 
4.3%
o 475
 
4.0%
Other values (49) 4394
36.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11990
100.0%
ValueCountFrequency (%)
(unknown) 11912
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1369
 
11.4%
r 972
 
8.1%
e 840
 
7.0%
a 835
 
7.0%
n 689
 
5.7%
i 642
 
5.4%
s 635
 
5.3%
M 570
 
4.8%
l 522
 
4.4%
o 510
 
4.3%
Other values (48) 4406
36.7%
ValueCountFrequency (%)
1356
 
11.4%
r 982
 
8.2%
a 831
 
7.0%
e 829
 
7.0%
i 679
 
5.7%
n 670
 
5.6%
s 631
 
5.3%
M 556
 
4.7%
l 509
 
4.3%
o 475
 
4.0%
Other values (49) 4394
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
289 
female
157 
male
286 
female
160 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.70403594.7174888
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20982104
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowfemalefemale
3rd rowfemalefemale
4th rowfemalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%

Length

2024-05-07T00:01:02.405613image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T00:01:02.571074image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:01:02.712416image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%

Most occurring characters

ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2098
100.0%
ValueCountFrequency (%)
(unknown) 2104
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2098
100.0%
ValueCountFrequency (%)
(unknown) 2104
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2098
100.0%
ValueCountFrequency (%)
(unknown) 2104
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7771
Distinct (%)21.9%19.7%
Missing9586
Missing (%)21.3%19.3%
Infinite00
Infinite (%)0.0%0.0%
Mean30.18424530.20025
 Dataset ADataset B
Minimum0.750.42
Maximum7174
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:02.924652image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.750.42
5-th percentile85.95
Q12121
median2929
Q33838.25
95-th percentile5655.525
Maximum7174
Range70.2573.58
Interquartile range (IQR)1717.25

Descriptive statistics

 Dataset ADataset B
Standard deviation13.76114.318968
Coefficient of variation (CV)0.455900090.47413408
Kurtosis0.0627871110.2474339
Mean30.18424530.20025
Median Absolute Deviation (MAD)88
Skewness0.334997260.42833798
Sum10594.6710872.09
Variance189.36513205.03284
MonotonicityNot monotonicNot monotonic
2024-05-07T00:01:03.202932image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 20
 
4.5%
30 16
 
3.6%
29 14
 
3.1%
24 13
 
2.9%
18 13
 
2.9%
31 12
 
2.7%
27 12
 
2.7%
36 12
 
2.7%
25 11
 
2.5%
19 11
 
2.5%
Other values (67) 217
48.7%
(Missing) 95
21.3%
ValueCountFrequency (%)
22 18
 
4.0%
24 17
 
3.8%
21 15
 
3.4%
32 14
 
3.1%
18 13
 
2.9%
28 12
 
2.7%
35 12
 
2.7%
30 12
 
2.7%
36 11
 
2.5%
26 11
 
2.5%
Other values (61) 225
50.4%
(Missing) 86
 
19.3%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 2
 
0.4%
4 3
0.7%
5 1
 
0.2%
7 1
 
0.2%
8 3
0.7%
9 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 2
 
0.4%
4 4
0.9%
5 2
 
0.4%
6 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 2
 
0.4%
4 4
0.9%
5 2
 
0.4%
6 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 2
 
0.4%
4 3
0.7%
5 1
 
0.2%
7 1
 
0.2%
8 3
0.7%
9 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.513452910.5
 Dataset ADataset B
Minimum00
Maximum88
Zeros306307
Zeros (%)68.6%68.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:03.407642image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.08660821.0679183
Coefficient of variation (CV)2.11627632.1358365
Kurtosis17.27686418.684215
Mean0.513452910.5
Median Absolute Deviation (MAD)00
Skewness3.63374713.7698908
Sum229223
Variance1.18071751.1404494
MonotonicityNot monotonicNot monotonic
2024-05-07T00:01:03.577191image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
3 10
 
2.2%
2 9
 
2.0%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 106
 
23.8%
2 12
 
2.7%
4 9
 
2.0%
3 6
 
1.3%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 9
 
2.0%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 106
 
23.8%
2 12
 
2.7%
3 6
 
1.3%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 106
 
23.8%
2 12
 
2.7%
3 6
 
1.3%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 9
 
2.0%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.390134530.38116592
 Dataset ADataset B
Minimum00
Maximum56
Zeros340341
Zeros (%)76.2%76.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:03.733549image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.842837090.81447825
Coefficient of variation (CV)2.16037552.1368076
Kurtosis9.63261810.684819
Mean0.390134530.38116592
Median Absolute Deviation (MAD)00
Skewness2.82586452.8314936
Sum174170
Variance0.710374360.66337482
MonotonicityNot monotonicNot monotonic
2024-05-07T00:01:03.892777image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 38
 
8.5%
4 4
 
0.9%
5 4
 
0.9%
3 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 56
 
12.6%
2 42
 
9.4%
5 2
 
0.4%
4 2
 
0.4%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 38
 
8.5%
3 1
 
0.2%
4 4
 
0.9%
5 4
 
0.9%
ValueCountFrequency (%)
0 341
76.5%
1 56
 
12.6%
2 42
 
9.4%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 56
 
12.6%
2 42
 
9.4%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 38
 
8.5%
3 1
 
0.2%
4 4
 
0.9%
5 4
 
0.9%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct382384
Distinct (%)85.7%86.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:04.469545image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.87668166.9349776
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters30673093
Distinct characters3235
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique334336 ?
Unique (%)74.9%75.3%

Sample

 Dataset ADataset B
1st row2346042672
2nd row250648113760
3rd row11752113781
4th rowPC 175822663
5th row349231349246
ValueCountFrequency (%)
pc 30
 
5.3%
c.a 14
 
2.5%
a/5 9
 
1.6%
ston/o 7
 
1.2%
2 7
 
1.2%
w./c 6
 
1.1%
ca 6
 
1.1%
sc/paris 6
 
1.1%
soton/o.q 5
 
0.9%
soton/oq 5
 
0.9%
Other values (401) 476
83.4%
ValueCountFrequency (%)
pc 30
 
5.2%
c.a 16
 
2.8%
ston/o 9
 
1.6%
2 9
 
1.6%
ca 8
 
1.4%
a/5 8
 
1.4%
w./c 7
 
1.2%
soton/oq 6
 
1.0%
soton/o.q 5
 
0.9%
2144 4
 
0.7%
Other values (402) 476
82.4%
2024-05-07T00:01:05.426718image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 374
12.2%
1 366
11.9%
2 300
9.8%
7 236
 
7.7%
4 221
 
7.2%
0 211
 
6.9%
6 196
 
6.4%
5 188
 
6.1%
9 172
 
5.6%
8 150
 
4.9%
Other values (22) 653
21.3%
ValueCountFrequency (%)
3 381
12.3%
1 353
11.4%
2 295
9.5%
4 236
 
7.6%
7 233
 
7.5%
0 217
 
7.0%
6 212
 
6.9%
5 187
 
6.0%
9 169
 
5.5%
8 151
 
4.9%
Other values (25) 659
21.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3067
100.0%
ValueCountFrequency (%)
(unknown) 3093
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 374
12.2%
1 366
11.9%
2 300
9.8%
7 236
 
7.7%
4 221
 
7.2%
0 211
 
6.9%
6 196
 
6.4%
5 188
 
6.1%
9 172
 
5.6%
8 150
 
4.9%
Other values (22) 653
21.3%
ValueCountFrequency (%)
3 381
12.3%
1 353
11.4%
2 295
9.5%
4 236
 
7.6%
7 233
 
7.5%
0 217
 
7.0%
6 212
 
6.9%
5 187
 
6.0%
9 169
 
5.5%
8 151
 
4.9%
Other values (25) 659
21.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3067
100.0%
ValueCountFrequency (%)
(unknown) 3093
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 374
12.2%
1 366
11.9%
2 300
9.8%
7 236
 
7.7%
4 221
 
7.2%
0 211
 
6.9%
6 196
 
6.4%
5 188
 
6.1%
9 172
 
5.6%
8 150
 
4.9%
Other values (22) 653
21.3%
ValueCountFrequency (%)
3 381
12.3%
1 353
11.4%
2 295
9.5%
4 236
 
7.6%
7 233
 
7.5%
0 217
 
7.0%
6 212
 
6.9%
5 187
 
6.0%
9 169
 
5.5%
8 151
 
4.9%
Other values (25) 659
21.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3067
100.0%
ValueCountFrequency (%)
(unknown) 3093
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 374
12.2%
1 366
11.9%
2 300
9.8%
7 236
 
7.7%
4 221
 
7.2%
0 211
 
6.9%
6 196
 
6.4%
5 188
 
6.1%
9 172
 
5.6%
8 150
 
4.9%
Other values (22) 653
21.3%
ValueCountFrequency (%)
3 381
12.3%
1 353
11.4%
2 295
9.5%
4 236
 
7.6%
7 233
 
7.5%
0 217
 
7.0%
6 212
 
6.9%
5 187
 
6.0%
9 169
 
5.5%
8 151
 
4.9%
Other values (25) 659
21.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct182176
Distinct (%)40.8%39.5%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.97379531.068908
 Dataset ADataset B
Minimum00
Maximum512.3292263
Zeros67
Zeros (%)1.3%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:05.709466image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1257.22605
Q17.89587.925
median14.454214.2
Q330.6468530.64685
95-th percentile112.67708110.38748
Maximum512.3292263
Range512.3292263
Interquartile range (IQR)22.7510522.72185

Descriptive statistics

 Dataset ADataset B
Standard deviation48.50599341.716614
Coefficient of variation (CV)1.51705461.3427126
Kurtosis27.74367611.794948
Mean31.97379531.068908
Median Absolute Deviation (MAD)7.20426.65
Skewness4.37102613.1220823
Sum14260.31213856.733
Variance2352.83131740.2759
MonotonicityNot monotonicNot monotonic
2024-05-07T00:01:05.988020image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 18
 
4.0%
7.8958 18
 
4.0%
8.05 17
 
3.8%
10.5 16
 
3.6%
7.75 15
 
3.4%
26 12
 
2.7%
7.2292 10
 
2.2%
7.25 9
 
2.0%
8.6625 8
 
1.8%
7.775 8
 
1.8%
Other values (172) 315
70.6%
ValueCountFrequency (%)
8.05 24
 
5.4%
7.8958 22
 
4.9%
26 19
 
4.3%
7.75 19
 
4.3%
13 18
 
4.0%
10.5 14
 
3.1%
7.925 11
 
2.5%
7.8542 9
 
2.0%
7.775 8
 
1.8%
8.6625 7
 
1.6%
Other values (166) 295
66.1%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 4
0.9%
ValueCountFrequency (%)
0 7
1.6%
6.4958 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.125 2
 
0.4%
7.225 6
1.3%
7.2292 5
1.1%
7.25 7
1.6%
ValueCountFrequency (%)
0 7
1.6%
6.4958 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.125 2
 
0.4%
7.225 6
1.3%
7.2292 5
1.1%
7.25 7
1.6%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 4
0.9%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8278
Distinct (%)85.4%80.4%
Missing350349
Missing (%)78.5%78.3%
Memory size7.0 KiB7.0 KiB
2024-05-07T00:01:06.511357image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.52083333.7113402
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters338360
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6961 ?
Unique (%)71.9%62.9%

Sample

 Dataset ADataset B
1st rowD47B96 B98
2nd rowC125C22 C26
3rd rowE34B41
4th rowC2G6
5th rowD26C124
ValueCountFrequency (%)
c23 3
 
2.7%
c27 3
 
2.7%
f 3
 
2.7%
c25 3
 
2.7%
d35 2
 
1.8%
c26 2
 
1.8%
c22 2
 
1.8%
b5 2
 
1.8%
c68 2
 
1.8%
c2 2
 
1.8%
Other values (80) 87
78.4%
ValueCountFrequency (%)
b96 4
 
3.5%
b98 4
 
3.5%
b63 2
 
1.7%
e101 2
 
1.7%
c26 2
 
1.7%
c22 2
 
1.7%
b22 2
 
1.7%
d26 2
 
1.7%
e44 2
 
1.7%
c2 2
 
1.7%
Other values (77) 91
79.1%
2024-05-07T00:01:07.246093image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 41
12.1%
2 39
11.5%
3 28
 
8.3%
1 26
 
7.7%
4 22
 
6.5%
B 22
 
6.5%
6 21
 
6.2%
5 18
 
5.3%
7 16
 
4.7%
8 16
 
4.7%
Other values (8) 89
26.3%
ValueCountFrequency (%)
2 40
11.1%
B 38
10.6%
C 36
10.0%
6 31
 
8.6%
1 27
 
7.5%
3 23
 
6.4%
8 21
 
5.8%
9 20
 
5.6%
E 19
 
5.3%
7 19
 
5.3%
Other values (8) 86
23.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 338
100.0%
ValueCountFrequency (%)
(unknown) 360
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 41
12.1%
2 39
11.5%
3 28
 
8.3%
1 26
 
7.7%
4 22
 
6.5%
B 22
 
6.5%
6 21
 
6.2%
5 18
 
5.3%
7 16
 
4.7%
8 16
 
4.7%
Other values (8) 89
26.3%
ValueCountFrequency (%)
2 40
11.1%
B 38
10.6%
C 36
10.0%
6 31
 
8.6%
1 27
 
7.5%
3 23
 
6.4%
8 21
 
5.8%
9 20
 
5.6%
E 19
 
5.3%
7 19
 
5.3%
Other values (8) 86
23.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 338
100.0%
ValueCountFrequency (%)
(unknown) 360
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 41
12.1%
2 39
11.5%
3 28
 
8.3%
1 26
 
7.7%
4 22
 
6.5%
B 22
 
6.5%
6 21
 
6.2%
5 18
 
5.3%
7 16
 
4.7%
8 16
 
4.7%
Other values (8) 89
26.3%
ValueCountFrequency (%)
2 40
11.1%
B 38
10.6%
C 36
10.0%
6 31
 
8.6%
1 27
 
7.5%
3 23
 
6.4%
8 21
 
5.8%
9 20
 
5.6%
E 19
 
5.3%
7 19
 
5.3%
Other values (8) 86
23.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 338
100.0%
ValueCountFrequency (%)
(unknown) 360
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 41
12.1%
2 39
11.5%
3 28
 
8.3%
1 26
 
7.7%
4 22
 
6.5%
B 22
 
6.5%
6 21
 
6.2%
5 18
 
5.3%
7 16
 
4.7%
8 16
 
4.7%
Other values (8) 89
26.3%
ValueCountFrequency (%)
2 40
11.1%
B 38
10.6%
C 36
10.0%
6 31
 
8.6%
1 27
 
7.5%
3 23
 
6.4%
8 21
 
5.8%
9 20
 
5.6%
E 19
 
5.3%
7 19
 
5.3%
Other values (8) 86
23.9%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing21
Missing (%)0.4%0.2%
Memory size7.0 KiB7.0 KiB
S
313 
C
90 
Q
41 
S
324 
C
79 
Q
42 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowSS
3rd rowSS
4th rowSC
5th rowSS

Common Values

ValueCountFrequency (%)
S 313
70.2%
C 90
 
20.2%
Q 41
 
9.2%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 324
72.6%
C 79
 
17.7%
Q 42
 
9.4%
(Missing) 1
 
0.2%

Length

2024-05-07T00:01:07.466105image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T00:01:07.612857image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:01:07.763430image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
s 313
70.5%
c 90
 
20.3%
q 41
 
9.2%
ValueCountFrequency (%)
s 324
72.8%
c 79
 
17.8%
q 42
 
9.4%

Most occurring characters

ValueCountFrequency (%)
S 313
70.5%
C 90
 
20.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 324
72.8%
C 79
 
17.8%
Q 42
 
9.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 313
70.5%
C 90
 
20.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 324
72.8%
C 79
 
17.8%
Q 42
 
9.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 313
70.5%
C 90
 
20.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 324
72.8%
C 79
 
17.8%
Q 42
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 313
70.5%
C 90
 
20.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 324
72.8%
C 79
 
17.8%
Q 42
 
9.4%

Interactions

Dataset A

2024-05-07T00:00:54.981735image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.313890image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:52.357956image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.311310image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:52.987417image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.768942image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.622513image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.235703image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.365171image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.842032image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:55.097839image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.399006image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:52.477608image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.395726image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.110005image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.856409image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.748178image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.330040image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.480059image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.928652image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:55.226593image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.499973image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:52.609564image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.490343image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.242200image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.956360image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.964645image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.422994image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.608493image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.026333image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:55.364886image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.599232image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:52.745102image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.590359image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.368515image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.046457image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.107546image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.648173image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.743090image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.128937image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:55.484781image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.693023image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:52.867035image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:56.680289image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:53.495314image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.142942image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.234568image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:57.745996image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T00:00:54.862314image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T00:00:58.221715image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

Dataset A

2024-05-07T00:00:55.665985image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-05-07T00:00:58.823697image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-05-07T00:00:55.926670image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-05-07T00:00:59.013806image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-05-07T00:00:56.100288image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-05-07T00:00:59.136042image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
19019112Pinsky, Mrs. (Rosa)female32.00023460413.0000NaNS
74774812Sinkkonen, Miss. Annafemale30.00025064813.0000NaNS
13613711Newsom, Miss. Helen Monypenyfemale19.0021175226.2833D47S
60961011Shutes, Miss. Elizabeth Wfemale40.000PC 17582153.4625C125S
64664703Cor, Mr. Liudevitmale19.0003492317.8958NaNS
31932011Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)female40.01116966134.5000E34C
33633701Pears, Mr. Thomas Clintonmale29.01011377666.6000C2S
19719803Olsen, Mr. Karl Siegwart Andreasmale42.00145798.4042NaNS
66366403Coleff, Mr. Pejumale36.0003492107.4958NaNS
17817902Hale, Mr. Reginaldmale30.00025065313.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
69369403Saad, Mr. Khalilmale25.00026727.2250NaNC
76376411Carter, Mrs. William Ernest (Lucile Polk)female36.012113760120.0000B96 B98S
29729801Allison, Miss. Helen Lorainefemale2.012113781151.5500C22 C26S
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
60660703Karaic, Mr. Milanmale30.0003492467.8958NaNS
40040113Niskanen, Mr. Juhamale39.000STON/O 2. 31012897.9250NaNS
58758811Frolicher-Stehli, Mr. Maxmillianmale60.0111356779.2000B41C
51451503Coleff, Mr. Satiomale24.0003492097.4958NaNS
10510603Mionoff, Mr. Stoytchomale28.0003492077.8958NaNS
293003Todoroff, Mr. LaliomaleNaN003492167.8958NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
43343403Kallio, Mr. Nikolai Erlandmale17.000STON/O 2. 31012747.1250NaNS
28728803Naidenoff, Mr. Penkomale22.0003492067.8958NaNS
57757811Silvey, Mrs. William Baird (Alice Munger)female39.0101350755.9000E44S
10510603Mionoff, Mr. Stoytchomale28.0003492077.8958NaNS
60260301Harrington, Mr. Charles HmaleNaN0011379642.4000NaNS
70270303Barbara, Miss. Saiidefemale18.001269114.4542NaNC
13813903Osen, Mr. Olaf Elonmale16.00075349.2167NaNS
46446503Maisner, Mr. SimonmaleNaN00A/S 28168.0500NaNS
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
70870911Cleaver, Miss. Alicefemale22.000113781151.5500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
53653701Butt, Major. Archibald Willinghammale45.00011305026.5500B38S
19019112Pinsky, Mrs. (Rosa)female32.00023460413.0000NaNS
41141203Hart, Mr. HenrymaleNaN003941406.8583NaNQ
48148202Frost, Mr. Anthony Wood "Archie"maleNaN002398540.0000NaNS
34034112Navratil, Master. Edmond Rogermale2.01123008026.0000F2S
616211Icard, Miss. Ameliefemale38.00011357280.0000B28NaN
85785811Daly, Mr. Peter Denismale51.00011305526.5500E17S
50650712Quick, Mrs. Frederick Charles (Jane Richards)female33.0022636026.0000NaNS
70870911Cleaver, Miss. Alicefemale22.000113781151.5500NaNS
18718811Romaine, Mr. Charles Hallace ("Mr C Rolmane")male45.00011142826.5500NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.